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Abstract 


This  report  presents  some  results  and  findings  of  our  work  on  very-low- 
bit-rate  video  compression  systems  using  vector  quantization  (VQ).  We 
have  identified  multiscale  segmentation  and  variable-rate  coding  as  two 
important  concepts  whose  effective  use  can  lead  to  superior  compression 
performance.  Two  VQ  algorithms  that  attempt  to  use  these  two  aspects 
are  presented:  one  based  on  residual  vector  quantization  and  the  other  on 
quadtree  vector  quantization.  Residual  vector  quantization  is  a  successive 
approximation  quantizer  technique  and  is  ideal  for  variable-rate  coding. 
Quadtree  vector  quantization  is  inherently  a  multiscale  coding  method. 
The  report  presents  the  general  theoretical  formulation  of  these  algorithms, 
as  well  as  quantitative  performance  of  sample  implementations. 
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1.  Introduction:  Very-Low-Bit-Rate  Video  Coding  for  the  Digital 
Battlefield 


Battlefield  digitization — the  process  of  representing  all  components  of  a 
battlefield  in  digital  form — allows  the  battlefield  and  its  components  to 
be  visualized,  simulated,  and  processed  on  computer  systems,  making  the 
Army  more  deadly  and  reducing  the  use  of  physical  resources.  For  com¬ 
plete  battlefield  digitization,  images  of  various  modalities  must  be  gath¬ 
ered  by  different  imaging  techniques.  These  images  are  then  processed  to 
provide  important  information  about  the  imaged  areas. 

An  important  class  of  visual  data  is  the  image  sequence  or  video,  and 
forward-looking  infrared  (FLIR)  video  is  an  important  source  of  informa¬ 
tion.  These  data  consist  of  a  series  of  two-dimensional  images  captured 
at  a  constant  temporal  rate.  The  main  drawback  to  effective  use  of  this 
data  source  is  the  huge  amount  of  raw  digital  data  (bits)  required  to  repre¬ 
sent  them.  This  volume  of  data  makes  real-time  gathering  and  transmission 
over  tactical  internets  impractical. 

To  effectively  combat  this  problem,  data  compression  is  used:  that  is,  tech¬ 
niques  to  reduce  the  number  of  bits  required  to  represent  the  data.  The 
large  compression  ratios  needed  to  "squeeze"  video  over  low-bandwidth 
digital  channels  require  the  use  of  "lossy"  image  compression  techniques. 
Lossy  compression  techniques  use  a  very  small  number  of  bits  to  represent 
the  data  at  the  cost  of  degraded  information.  These  techniques  require  a 
trade-off  between  video  quality  and  bit-rate  constraints. 

For  intelligent  compression  of  FLIR  video  images,  the  bit  assignment  should 
be  made  so  that  more  bits  are  assigned  to  active  areas,  while  fewer  are  as¬ 
signed  to  passive  background  areas.  Vector  quantization  (a  block  quantiza¬ 
tion  technique)  has  this  type  of  adaptability,  so  that  it  is  highly  suitable  for 
compressing  FLIR  video. 

In  the  work  reported  here,  we  systematically  study  the  use  of  vector  quan¬ 
tization  for  compressing  video  sequences.  Results  are  shown  for  both  FLIR 
video  and  regular  gray-scale  video.  (A  single  representative  frame  of  the 
original  FLIR  video  scene  and  its  compressed  representation  are  shown  in 
fig.  1.)  We  specifically  study  two  adaptive  vector  quantization  techniques: 
the  residual  vector  quantizer  (VQ)  and  the  quadtree  VQ.  These  two  tech¬ 
niques  permit  the  encoding  of  sources  at  different  levels  of  precision  de¬ 
pending  on  content. 


1 


Figure  1.  Original  (top)  and  compressed  (bottom)  representations  of  a  frame  from  a  FLIR 
video  sequence. 

The  targeted  bit  rate  is  in  the  very  low  (5  to  16  kb/s)  range;  this  rate  allows 
the  compressed  video  to  be  transmitted  over  SINCGARS  (Single-Channel 
Ground  to  Air  Radio  System)  channels,  as  well  as  permitting  multiple  video 
streams  to  be  multiplexed  and  transmitted  over  Fractional  T1  lines.  Such 
multiplex  transmission  would  allow  the  video  to  be  collected  by  sources 
such  as  unmanned  airborne  vehicles  (UAVs)  and  transmitted  to  processing 
centers  in  real  time. 
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2.  Background 


Images  are  represented  in  the  digital  domain  by  a  matrix/ array  of  intensity 
values,  and  video  sequences  are  represented  by  a  series  of  matrices.  These 
matrices  are  often  large,  requiring  large  amounts  of  storage  space  and/ or 
transmission  bandwidth.  When  resources  are  limited  (storage  spaces  or 
bandwidth),  it  is  essential  to  reduce  the  amount  of  data  necessary  for  repre¬ 
senting  the  digital  imagery.  Data  can  be  reduced  (compressed)  either  with 
no  loss  in  data  (lossless  compression)  or  with  some  degradation/ distortion 
of  the  data  (lossy  compression).  In  lossless  compression,  redundancy  in  the 
data  is  removed,  resulting  in  a  smaller  representation,  but  the  ratio  of  com¬ 
pression  that  can  be  achieved  is  small.  On  the  other  hand,  lossy  compres¬ 
sion  techniques  trade  off  the  compression  ratio  against  the  tolerated  dis¬ 
tortion.  Most  current  image-  and  video-compression  techniques  are  lossy, 
since  human  visual  perception  can  tolerate  a  certain  amount  of  distortion  in 
the  presented  visual  data.  Lossy  compression  can  be  achieved  by  quantiza¬ 
tion,  a  lossy  compression  technique  in  which  data  are  represented  at  lower 
numerical  precision  than  in  the  original  representation. 


2.1  Quantization 


A  quantization  Q  of  a  random  variable  A  G  7^  is  a  mapping  from  7^  to  C,  a 
finite  subset  of  TZ: 

Q-.n^c,  Ccn.  (1) 

The  cardinality  Nc  of  the  set  C  gives  the  number  of  quantization  levels.  The 
mapping  Q  is  generally  a  staircase  function,  as  shown  in  figure  2,  where  TZ 
is  divided  into  Nc  segments  [bi  —  l,bi),i  =  1, . . .  ,N.  Each  Xn  G  [bi  —  1,  bi) 
is  mapped  to  Cj  G  C,  where  Ci  is  the  reconstruction  value. 

A  sequence  of  random  variables  Xn  can  be  quantized  by  two  different 
methods.  The  first  method  involves  each  individual  member  of  the  se¬ 
quence  being  quantized  separately  by  the  quantizer  Q  defined  above.  This 
method  is  called  scalar  quantization.  In  the  second  method,  the  sequence  is 
grouped  into  blocks  of  adjacent  members,  and  each  block  (a  vector)  is  quan¬ 
tized  by  a  vector  quantizer.  In  the  work  reported  here,  vector  quantization 
(sect.  3)  is  applied  to  video  compression. 
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Figure  2.  Scalar  quantizer  Q  of  a  random  variable 

2.2  Video  Compression 

A  video  sequence  is  a  three-dimensional  signal  of  light  intensity,  with  two 
spatial  dimensions  and  a  temporal  dimension.  A  digital  video  sequence  is 
a  three-dimensional  signal  that  is  suitably  sampled  in  all  three  dimensions; 
it  is  in  the  form  of  a  three-dimensional  matrix  of  intensity  values.  A  typical 
video  sequence  has  a  significant  amount  of  correlation  between  neighbors 
in  all  three  dimensions.  The  type  of  correlation  in  the  temporal  dimension 
is  significantly  different  from  that  in  the  spatial  dimensions. 

There  are  many  different  approaches  to  video  compression,  and  some  in¬ 
ternational  compression  standards  have  been  established.  Among  the  dif¬ 
ferent  approaches  is  a  class  of  algorithms  that  first  attempt  to  remove  cor¬ 
relations  in  the  temporal  domain  and  then  deal  with  removing  correlations 
in  the  spatial  dimensions.  Among  these  is  motion  compensation  (MC),  a 
popular  technique  to  remove  the  correlations  in  the  temporal  domain.  Mo¬ 
tion  compensation  results  in  a  residue  sequence,  which  is  then  quantized 
by  two-dimensional  quantization  techniques  similar  to  those  used  for  com¬ 
pressing  still  images. 

2.3  Motion  Compensation 

A  video  scene  usually  contains  some  motion  of  objects,  occlusion/ exposure 
of  areas  due  to  such  motion,  and  some  deformation.  The  rate  of  these 
changes  is  typically  much  smaller  than  the  frame  rate  (i.e.,  the  rate  of  sam¬ 
pling  in  the  temporal  dimension).  Therefore,  there  is  very  little  change  be¬ 
tween  two  adjacent  frames.  A  motion-compensation  algorithm  exploits  this 
consistency  to  approximate  the  current  frame  by  using  pieces  from  the 
previous  frame.  The  result  is  a  reasonable  approximation  of  the  current 
frame  based  on  the  previous  one,  with  some  side  information  in  the  form 
of  motion  vectors.  The  difference  between  the  approximation  of  the  cur¬ 
rent  frame  and  the  actual  frame  is  quantized  by  a  set  of  scalar  or  vector 
quantizers. 
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In  figure  3,  which  shows  the  block  diagram  of  the  encoder  and  decoder,  the 
difference  between  the  approximation  and  the  original  is  quantized  by  the 
quantizer  Q.  The  encoder  is  a  closed-loop  system,  as  shown  in  the  block  di¬ 
agram;  it  contains  both  a  quantizer  Q  and  an  inverse  quantizer  Q~^.  Since 
the  encoder  uses  the  previous  frame  to  approximate  the  current  frame,  the 
decoder  needs  the  previous  frame  to  generate  the  current  frame.  The  de¬ 
coder  has  only  the  quantized  version  of  the  previous  frame  and  not  the 
original  frame.  The  inverse  quantizer  Q~^  in  the  encoder  duplicates  the  de¬ 
coder  states  at  the  encoder  and  gives  the  encoder  access  to  a  quantized 
version  of  the  previous  frame.  The  encoder  uses  this  quantized  version 
of  the  previous  frame  to  generate  an  approximation  of  the  current  frame. 
This  ensures  that  the  approximation  of  the  current  frame  generated  from 
the  previous  frame  is  the  same  at  both  the  encoder  and  decoder.  Figure  4 
shows  the  entropy  of  a  sequence  after  (1)  decorrelation  by  taking  the  frame 
difference  and  (2)  decorrelation  using  block  motion  estimation.  It  can  be 
seen  that  the  entropy  in  this  case  is  reduced  by  a  factor  of  two  compared 
to  the  original  frame  entropy.  Between  frames  105  and  120,  the  entropy  of 
the  residue  due  to  motion  estimation  is  significantly  less  than  the  entropy 
of  the  residue  from  frame  differencing.  This  difference  is  due  to  significant 
motion  of  objects  in  the  images  during  that  period. 
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Figure  3.  Motion  compensation  (MC)  for  video  coding. 


Figure  4.  Entropy  of  a  sequence  after  decorrelation  in  temporal  dimension. 
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3.  Vector  Quantization 


A  vector  quantizer  Q  is  a  mapping  from  a  point  in  /c-dimensional  Euclidean 
space  TZ^  into  a  finite  subset  C  of  TZ^  containing  N  reproduction  points  or 
vectors: 

Q:TZ^  ^C. 

The  N  reproduction  vectors  are  called  codevectors,  and  the  set  C  is  called  the 
codebook.  C  =  (ci,  C2, . . . ,  c„),  and  c*  G  TZ^  for  each  i  £  T  =  {1,2, ,  N}. 
The  codebook  C  has  N  distinct  members.  The  rate  of  the  VQ,  r  =  log2(iV), 
measures  the  number  of  bits  required  to  index  a  member  of  the  codebook. 

A  VQ  partitions  the  space  TZ^  into  cells  TZi: 

=  {X  G  7^^  :  Q(X)  =  cJVi  G  T. 

These  cells  represent  the  pre-image  of  the  points  c*  under  the  mapping  Q, 
i.e.,  TZi  =  Q~^{ci).  These  cells  have  the  following  properties: 

UiTZi  = 

TZi  n  TZj  =  {0}Vi  7^  j. 

These  properties  imply  that  the  cells  are  disjoint  and  that  they  cover  the 
entire  space  TZ^. 

The  VQs  dealt  with  here  have  the  following  additional  properties: 

•  They  are  regular.  The  cells  of  a  regular  VQ,  TZi,  are  convex,  and  Ci  G 
TZi. 

•  They  are  polytopal.  The  cells  of  a  polytopal  VQ  are  polytopal.  Poly¬ 
topes  are  geometric  regions  bounded  by  hyperplane  surfaces.  A  poly¬ 
topal  region  is  the  intersection  of  a  finite  number  of  subspaces. 

•  They  are  bounded.  A  VQ  is  bounded  if  it  is  defined  on  a  bounded 
domain  B  c  TZ^;  i.e.,  every  input  vector  X  lies  in  B. 

A  VQ  consists  of  two  operators,  an  encoder  and  a  decoder.  The  encoder 
7  associates  every  input  vector  X  to  i,  which  is  some  member  of  the  in¬ 
dex  set  T.  The  decoder  (3  associates  the  index  i  to  Cj,  some  member  of  the 
reproduction  set  C: 
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7  :  ^  T, 


Q(X)=/3(7(X)). 


The  block  diagram  of  the  VQ  is  shown  in  figure  5.  The  encoding  operation 
is  completely  determined  by  the  partition  of  the  input  space.  The  encoder 
identifies  the  cell  to  which  a  given  input  vector  belongs.  The  decoding  op¬ 
eration  is  determined  by  the  codebook.  Given  the  cell  to  which  the  input 
vector  belongs,  the  decoder  determines  the  reproduction  vector  that  best 
represents  the  input  vector.  The  decoder  is  very  often  in  the  form  of  a  sim¬ 
ple  lookup  table.  Given  the  index,  the  table  returns  the  vector  entry  corre¬ 
sponding  to  the  index. 
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Encoder 
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Channel 
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Input  vector 


Figure  5.  Encoder /decoder  model  of  a  VQ. 


3.1  Quantization  Error  of  Vector  Quantizers 

The  performance  of  a  VQ  can  be  evaluated  by  the  average  distortion  in¬ 
troduced  by  encoding  a  set  of  training  input  vectors.  Ideally,  the  distortion 
should  be  zero.  The  output  of  the  decoder  should  be  a  close  representation 
of  the  input  vector.  The  expected  value  of  the  distortion  measure  represents 
the  performance  of  the  quantizer: 

P  =  E(d(X,Q(X))), 

where  d(X,  Q{X.))  represents  the  distortion  introduced  by  the  quantizer  for 
the  input  vector  X. 

One  important  distortion  measure  is  the  squared-error  distortion  measure 
(Euclidean  distortion /L2  distortion).  This  distortion  measure  is  especially 
relevant  to  image  coding  problems,  where  the  mean  squared  error  is  widely 
used  as  a  quantitative  measure  of  the  performance  of  coding: 

d(X,Q(X))  =  ||X-Q(X)||2, 

V  =  E(||X-Q(X)||2). 


Other  distortion  measures  include  the  weighed  squared-error  distortion 
measure,  the  Mahalanobis  distortion  measure,  and  the  Itakura-Saito  dis¬ 
tortion  measure. 
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3.2  Optimality  Conditions  for  Vector  Quantizers 

An  optimal  VQ  is  one  that  minimizes  the  overall  distortion  measure  for 
any  vector  X  with  a  probability  distribution  -P(X).  A  VQ  has  to  satisfy  two 
optimality  conditions  to  achieve  this  minimum  distortion: 

•  For  a  given  fixed  decoder  j3,  the  encoder  7  should  be  the  one  that 
minimizes  the  overall  distortion. 

•  For  a  given  fixed  encoder  7,  the  decoder  (3  should  be  the  best  possible 
decoder. 

3.2.1  Nearest-Neighbor  Condition 

Given  a  decoder,  it  is  necessary  to  find  the  best  possible  encoder.  The  de¬ 
coder  contains  a  finite  set  of  vectors  C,  one  of  which  is  used  to  represent  the 
input  vector.  For  a  given  vector  X,  the  vector  c*  is  the  nearest  neighbor  if 

d(X,Ci)  <  d(X,Cj)  Vcj-eC. 

The  overall  distortion  for  a  given  fixed  codebook  C  is  given  by 

P  =  E(d(X,Q(X))), 

V  =  y’d(X,Q(X))P(X)  dX; 

clearly, 

J  d(X,Q(X))P(X)  dX>  J  d(X,Ci)P(X)  dX, 

where  c*  is  the  nearest  neighbor  of  X.  Therefore,  the  best  possible  encoder 
for  a  given  decoder  is  the  nearest-neighbor  encoder. 

3.2.2  Centroid  Condition 

For  a  fixed  encoder,  it  is  necessary  to  find  the  reproduction  codebook  that 
minimizes  the  overall  distortion.  For  a  given  cell  TZi,  the  centroid  Cj  is  de¬ 
fined  as 

d^(X,  c^)  ^  X^(X,  c)  V  X,  c  G  Ci  G  'IZi- 

For  a  given  probability  distribution,  and  for  a  given  encoder,  the  overall 
distortion  is  given  by 

V  =  J  d(X,Q(X))P(X)  dX, 

^  =  ^2!  d(X,c)P(X)  dX. 
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Clearly, 


v/  d(X,c)P(X)dX>  V  /  d(X,Ci)P(X)dX. 

Therefore,  for  a  given  encoder,  the  optimum  decoder  is  the  centroid  of  the 
nearest-neighbor  partitions. 

Consider  the  set  A  of  all  possible  partitions  of  the  input  vectors,  and  a  col¬ 
lection  C°  of  all  possible  reproduction  sets.  The  optimum  VQ  is  the  pair 

({7Zi},C);{7Zi}  €  A  and  C  €  C°, 

such  that  TZi  is  the  nearest  neighbor  partition  of  C  that  contains  the  cen¬ 
troids  of  the  partitions  in  TZi.  These  two  conditions  are  generalizations  of 
the  Lloyd-Max  conditions  for  scalar  quantizers. 

3.3  Design  of  Vector  Quantizers 

Design  of  VQs  is  a  very  difficult  problem.  For  a  given  probability  dis¬ 
tribution  T’(X),  it  is  necessary  to  find  the  encoder  and  decoder  that  si¬ 
multaneously  satisfy  both  the  nearest-neighbor  condition  and  the  centroid 
condition.  Unfortunately,  no  closed-form  solutions  exist  for  even  simple 
distributions. 

A  number  of  methods  have  been  proposed  for  the  design  of  VQs.  All  these 
are  iterative  methods  based  on  finding  the  best  VQ  for  a  training  set. 

3.3.1  Generalized  Lloyd's  Algorithm 

The  generalized  Lloyd's  algorithm  (GLA)  (also  known  as  the  LEG  algo¬ 
rithm  after  Linde,  Buzo,  and  Gray  [1])  is  an  iterative  algorithm.  This  algo¬ 
rithm,  which  is  similar  to  the  /c-means  clustering  algorithm,  consists  of  two 
basic  steps: 

•  For  a  given  codebook  Ct,  find  the  best  partition  {TZi}t  of  the  training 
set  satisfying  the  nearest-neighbor  neighborhood  condition. 

•  For  the  new  partition  {TZi}t,  find  the  best  reproduction  codebook  Ct+i 
satisfying  the  centroid  condition. 

These  two  steps  are  repeated  until  the  required  codebook  is  obtained.  The 
training  algorithm  begins  with  an  initial  codebook,  which  is  refined  by  the 
Lloyd's  iterations  until  an  acceptable  codebook  is  obtained.  A  codebook  is 
considered  acceptable  if  the  error  difference  between  the  present  and  the 
previous  codebooks  is  less  than  a  threshold. 
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3.3.2  Kohonen's  Self-Organizing  Feature  Map 


Kohonen's  self-organizing  feature  maps  (KSOFMs)  can  be  used  to  design 
VQs  with  optimal  codebooks  [2,3].  In  this  method  of  codebook  design,  an 
energy  function  for  error  is  formulated  and  minimized  iteratively.  This  de¬ 
sign  procedure  is  sequential,  unlike  GLA,  which  uses  the  batch  method  of 
training. 


3.4  Entropy-Constrained  Vector  Quantizer 

For  transmission  over  a  binary  channel,  the  index  of  the  reproduction  vec¬ 
tor  from  a  codebook  C  of  size  N  is  represented  by  a  binary  string  of  length 
[log2  N]  bits.  Often,  it  is  possible  to  further  reduce  the  number  of  bits  re¬ 
quired  to  represent  the  indices  by  using  entropy  coding  as  shown  in  fig¬ 
ure  6.  Entropy  coding  reduces  the  transmission  entropy  rate  from  [log2  N~\ 
per  block  to  almost  the  entropy  rate  of  the  index  sequence.  Since  typical 
codebook  design  algorithms  do  not  consider  the  possible  entropy  rates  of 
the  index  sequences,  the  codebooks  do  not  combine  with  an  entropy  coder 
in  an  optimal  way.  Design  of  entropy-constrained  VQs  (ECVQs)  has  been 
studied  by  Chou  et  al  [4]  (among  others),  who  used  a  Lagrangian  formu¬ 
lation  with  a  gradient-based  algorithm  similar  to  the  Lloyd's  algorithm  to 
design  the  codebooks. 

Consider  a  vector  X  G  TZ^  quantized  by  a  VQ  with  a  codebook  C  =  {cj  : 
j  =  1, . . . ,  N}.  Let  l{i)  represent  the  length  of  the  binary  string  used  to 
represent  the  index  i  of  the  reproduction  vector  Cj  of  X.  Then  the  energy 
function  that  is  minimized  in  the  design  of  an  ECVQ  is  given  by  [4] 

J(7,  (5)  =  E[d(xi,  Q(x,)]  +  AE[Z(i)],  (2) 


where  7  and  f3  are  the  VQ  encoder  and  decoder,  respectively.  The  index  en¬ 
tropy  log(l/p(i))  is  used  in  the  algorithm  to  represent  the  length  of  the  bi¬ 
nary  string  required  to  represent  the  index  i.  The  codebook  is  then  designed 


X 

Input 

vector 


e(X) 

Approx,  of 
input  vector 


Figure  6.  Entropy  coding  of  VQ  indices. 
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in  an  iterative  manner,  similar  to  the  Lloyd's  algorithm,  through  choosing 
an  encoder  and  decoder  that  decrease  the  energy  function  in  equation  (2) 
at  every  iteration.  Experimental  results  have  shown  that  the  ECVQ  design 
algorithm  described  above  gives  an  encoder-decoder  pair  that  has  superior 
numerical  performance. 


3.5  Video  Compression  Using  Vector  Quantization 


The  residual  signal  obtained  after  motion  compensation  can  be  compressed 
by  vector  quantization.  The  two-dimensional  signal  is  divided  into  blocks 
of  equal  size,  as  shown  in  figure  7.  The  VQ  encoder  that  is  used  to  compress 
the  residual  signal  is  a  nearest-neighbor  encoder.  It  has  a  reference  lookup 
table  that  contains  the  centroids  of  the  VQ  partitions.  The  encoder  com¬ 
pares  each  block  (in  some  predefined  scanning  order)  with  each  member 
of  the  lookup  table  to  find  the  closest  match  in  terms  of  the  defined  dis¬ 
tortion  measure  (usually  the  mean-squared  error).  The  index  of  the  closest 
matching  codevector  in  the  lookup  table  is  then  transmitted /stored  as  the 
compressed  representation  of  the  corresponding  vector  (block). 


The  decoder  is  a  simple  lookup  table  decoder,  as  shown  in  figure  8.  The 
decoder  uses  the  index  symbol  generated  by  the  encoder  as  a  reference  to 
an  entry  in  a  lookup  table  in  the  decoder.  The  lookup  table  in  the  decoder 
is  usually  identical  to  the  one  in  the  encoder.  This  lookup  table  contains  the 
possible  approximations  for  the  blocks  in  the  reconstructed  image.  Based 
on  the  index,  the  approximate  representation  of  the  current  block  is  deter¬ 
mined.  To  generate  the  reconstructed  two-dimensional  array,  the  decoder 
places  this  representation  at  the  position  corresponding  to  the  scanning  or¬ 
der.  The  reconstructed  array  is  used  along  with  the  motion-compensation 
algorithm  to  reproduce  the  compressed  video  sequence. 
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Figure  7.  VQ  encoder  for  two-dimensional  arrays. 
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Figure  8.  VQ  decoder  for  two-dimensional  arrays. 
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4.  Residual  Vector  Quantization 


Residual  vector  quantization  (RVQ)  is  a  structured  vector  quantization 
scheme  proposed  mainly  to  overcome  the  search  and  storage  complexities 
of  regular  VQs  [5,6].  Residual  VQs  are  also  known  as  multistage  VQs.  They 
consist  of  a  number  of  cascaded  VQs.  Each  stage  has  a  VQ  with  a  small 
codebook  that  quantizes  the  error  signal  from  the  previous  stage.  Residual 
quantizers  are  successive-refinement  quantizers,  where  the  information  to 
be  transmitted /stored  is  first  approximated  coarsely  and  then  refined  in 
the  successive  stages. 

4.1  Residual  Quantization 

Consider  a  random  variable  X  with  a  probability  distribution  function 
P{X).  Let  Q^(V)  be  an  level  quantizer  and  its  associated  bit  rate  be 
log2(V^)  bits.  The  error  due  to  this  quantizer  is 

and  the  expected  value  of  the  distortion  is  given  by 

=  J  d{X\Q\X^))P{X^)  dX\ 

If  the  random  variable  needs  to  be  represented  more  precisely  (i.e.,  if  the 
expected  value  of  the  distortion  needs  to  be  smaller),  the  first-stage  residue 
can  be  quantized  again  by  a  second  quantizer  Q^.  The  quantizer  ap¬ 
proximates  the  random  variable  R^.  Let  be  an  level  quantizer, 

and  its  associated  bit  rate  be  log2(A^^)  bits.  The  error  due  to  this  quantizer 
is  given  by 

and  the  expected  value  of  distortion  is  now 

P^  =  J  d[V,  (QH^)  +  Q^(1?^))]P(V)  dX. 


This  process  can  be  thought  of  as  a  cascade  of  two  quantizers,  as  shown  in 
figure  9.  The  total  bit  rate  of  the  quantization  scheme  is  log2(V^)-|-log2(iV2). 
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Q2 


Q2(7?1) 


Figure  9.  Residual  quantizer — cascade  of  two  quantizers. 

This  scheme  can  be  extended  to  any  number  of  quantizers.  A  A-stage  resid¬ 
ual  quantizer  consists  of  K  quantizers  {Q^:A:  =  1,...,A}.  Each  quantizer 
quantizes  the  residue  of  the  previous  stage,  The  total  bit  rate  of 

the  quantization  scheme  is  given  by 

K 

B=^log2(iV"), 

k=0 

where  is  the  number  of  quantization  levels  of  the  quantizer  Q^. 

4.2  Residual  Vector  Quantizer 

A  residual  VQ  is  a  vector  generalization  of  the  residual  quantizer  outlined 
above.  A  A-stage  residual  VQ  is  composed  of  A  VQs  {Q^:/c  =  l,...,A}. 
Each  VQ  consists  of  its  own  codebook  of  size  N^.  The  /cth-stage  VQ 
operates  on  the  residue  from  the  previous  stage.  The  residue  due  to 

the  first  stage  is  given  by 

=  X-Q^(X). 

The  final  quantized  value  of  the  vector  X  is  given  by 

Q(X)  =  Q^(X)  +  Q2(R^)  +  . . .  +  Q^(r(^-i))  +  . . .  +  Q^(r('^-i)). 

4.3  Search  Techniques  for  Residual  Vector  Quantizers 

The  structure  of  a  residual  VQ  inherently  lends  itself  to  a  number  of  possi¬ 
ble  encoding  schemes.  Two  of  the  main  characteristics  of  encoding  in  resid¬ 
ual  quantizers  are 

•  overall  optimality — the  least  overall  distortion  at  the  end  of  the  last 
stage  of  encoding,  and 

•  stage-wise  optimality — the  least  distortion  possible  at  the  end  of  each 
stage. 
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4.3.1  Exhaustive  Search 


Exhaustive  search  in  residual  quantizers  aims  at  achieving  the  least  over¬ 
all  distortion.  In  exhaustive  search  schemes,  all  possible  combinations  of  all 
the  stage  quantizations  are  searched,  and  the  combination  giving  rise  to  the 
least  distortion  is  chosen.  This  search  gives  the  best  possible  performance 
in  the  residual  quantization  scheme.  But  this  search  scheme  is  computa¬ 
tionally  very  expensive  and  is  the  same  as  that  of  an  unstructured  VQ.  The 
search  complexity  for  a  iT-stage  VQ  with  codebook  sizes  {Ni,N2,  ■ . .  Nk} 
is  of  the  order  0{Ni  x  N2  x  ...  Nk)-  Exhaustive  search  schemes  are  not 
particularly  appropriate  for  progressive  transmission  schemes  (successive 
refinement). 

4.3.2  Sequential  Search 

Sequential  search  in  residual  quantizers  makes  full  use  of  the  structural 
constraint  of  the  quantizer.  The  search  process  is  stage  by  stage,  wherein 
the  quantization  value  that  minimizes  the  distortion  up  to  that  stage  is 
chosen.  This  search  scheme  is  inherently  inferior  to  exhaustive  schemes, 
leading  usually  to  suboptimal  overall  distortion  performance.  It  is  also  the 
least  expensive  of  all  search  schemes.  The  search  complexity  for  a  iT-stage 
quantizer  with  codebook  sizes  {Ni,  N2, . .  ■  Nk}  is  of  the  order  0{Ni  -|-  V2  -|- 
. . .  Nk)  -  The  search  scheme  is  particularly  well-suited  for  progressive  trans¬ 
mission  schemes. 


4.3.3  M-Search 


A  hybrid  search  scheme,  M-search,  has  been  proposed  [7]  whose  search 
complexity  is  less  than  that  of  a  full  search,  but  greater  than  that  of  a  se¬ 
quential.  This  scheme  produces  overall  distortion  performance  that  is  bet¬ 
ter  than  that  of  sequential  search  schemes  and  close  to  that  of  the  exhaustive 
search  scheme.  In  this  scheme,  a  subset  of  the  quantization  values  is  chosen 
at  each  stage  based  on  the  least  distortion;  these  subsets  are  searched  in  an 
exhaustive  fashion  to  get  the  quantized  value. 

4.4  Structure  of  Residual  VQs 

A  quantizer  partitions  an  input  space  into  a  finite  number  of  polytopal 
regions  (fig.  10).  The  centroid  of  each  polytope  approximates  all  the  in¬ 
put  symbols  that  belong  to  that  particular  region.  The  process  of  finding 
the  residue  of  the  signal  is  equivalent  to  shifting  the  coordinate  system  to 
the  centroid  of  the  polytope.  This  process  is  repeated  for  all  the  polytopes. 
Therefore,  we  have  a  finite  set  of  spaces,  each  corresponding  to  a  polytope. 
These  spaces  are  bounded  by  the  underlying  polytope  of  the  partition;  i.e.. 
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Figure  10.  Structure  of  a  residual  VQ. 

each  of  these  spaces  contains  members  around  the  origin  that  are  limited 
in  location  by  the  polytope  to  which  they  belong. 

Now  consider  the  quantization  of  each  of  these  spaces.  If  the  optimal  quan¬ 
tizer  is  found  for  each  of  these  spaces,  their  structures  (i.e.,  the  partition  of 
the  input  space)  may  be  totally  different.  Such  a  quantizer  is  called  a  tree- 
structured  quantizer.  If  a  constraint  is  imposed  such  that  the  same  partition 
structure  is  used  for  all  the  subspaces,  then  the  method  of  quantization  is 
the  residual  quantization  scheme. 


4.5  Optimality  Conditions  for  Residual  Quantizers 
4.5.1  Overall  Optimality 


Consider  a  ffT-stage  residual  quantizer  with  a  set  of  quantizers  Q, 


Q  =  {Q^Q^...,g^...g"} 

and  codebooks  C, 


with  stage  indices  Vi  =  {k  :  k  =  1 ...  K}.  Each  stage  codebook  contains 
codevectors  =  {cj^,  c^, . . . ,  c^*.}.  As  with  VQs,  we  derive  two  op¬ 
timality  conditions.  For  the  first  condition,  given  the  encoder,  we  find  the 
best  possible  decoder.  For  the  second  condition,  we  find  the  best  possible 
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encoder  for  a  given  decoder.  We  derive  the  conditions  for  a  particular  stage, 
assuming  that  all  other  stages  have  fixed  encoders  and  decoders. 


Centroid  condition: 
The  index  vector 


be  Vi  = 


I  ’S-2  ’ 


I  belongs  to  the  index  space  J  =  {I  :  I  = 
=  1, . . . ,  N^}.  Let  the  partition  of  the  input  space 
,  based  on  the  different  stage  quantizers.  This 


partition  is  based  on  a  fixed  encoder.  To  find  the  best  decoder  for  the  stage 
K,  let  the  decoders  of  stages  {K|k}  be  fixed.  Overall  distortion  is  given  by 


P(X,Q(X))=^^(X-c,\-4 
lex  Pi 


iK  i 


—  —  c 


ft+1 


^P(X). 


(3) 


For  the  codevector  c|T  of  the  Kth  stage  to  be  optimal,  the  following  has  to 
be  true: 

'^PX,Q(X))  _ 

fcf, 

that  is, 

^  ^(X  -  c^i  -  4  ...  -  -c^-\  -  cr«  -  ...  -  C^'^)P(X)  =  0,  (5) 

Vi 


where  V  =  {I  el :  ik  =  ik}-  Solving  for  cjii,  we  get 


C,K  = 


Xiex,K  Xpi(X  -  4  -  c? 


- c 


K—l 


—  c 


ft+1 


-c^k)P{^) 


Xiex,/^  Xpi  -P(^) 


(6) 


A  similar  equation  has  been  derived  elsewhere  [7].  This  result  has  been 
described  [7]  as  the  centroid  of  the  grafted  residue. 

Nearest-neighbor  condition: 

For  fixed  decoders  and  fixed  encoders  for  stages  K|k,  the  optimal  stage  k 
encoder  is  one  that  either  minimizes  the  overall  distortion  or  minimizes 
the  distortion  for  that  stage.  In  either  case,  the  mapping  that  produces  the 
least  distortion  is  the  nearest-neighbor  mapping.  For  exhaustive  search  de¬ 
coders,  the  best  encoder  is  the  nearest-neighbor  mapping  for  the  direct-sum 
codebook. 


4.5.2  Causal  Stages  Optimality 

For  the  encoder  to  be  optimal  in  terms  of  quantizers  up  to  the  present  stage, 
the  optimality  conditions  are  as  follows. 

Centroid  condition: 

Let  the  partition  of  the  input  space  be  V[  =  V^i  ^2  based  on  the 
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causal  stage  quantizers.  For  a  fixed  encoder,  the  optimal  Kth  stage  code 
is  given  by 


c,„  = 


-  c-1  -  cj 


_ 


—  c 


K-\-l 

K.+  1 


.  —  c 


K 


)P(X) 


-P(^) 


(7) 


where  T'k  =  {!:!  =  (i^,  . . . ,  . . .  i'^),  =  1, is  the  index 

vector  of  the  causal  stages  including  the  present  stage;  c'^k  is  the  centroid  of 
the  direct  partition  up  to  the  k  stage. 

Nearest-neighbor  condition: 

For  fixed  decoders  and  fixed  encoders  for  stages  K|k,  the  optimal  Kth  stage 
encoder  is  the  nearest-neighbor  mapping  encoder. 


4.5.3  Simultaneous  Causal  and  Overall  Optimality 

Consider  the  Kth  stage  encoder  of  a  K-stage  residual  VQ.  Let  us  assume 
that  it  is  optimal  in  terms  of  both  causal  and  overall  distortion.  Then  it 
satisfies  the  following  two  equations  simultaneously: 


C,K  = 


C,K  = 


_ _  pK+i 


-cfK)Pm 


Z^iex,,.  Y.Pi  -P(X) 
Ei6X:.  Epi(X  -  cl,  -  c2,  .  .  .  - 


Eiex'„  Ep{  -P(X) 


,(8) 

(9) 


The  two  denominators  are  equal;  therefore. 


E  E(^  -  -  4  •  •  •  -  -4-1  -  •  • 

.-c^)P(X)  = 

Vi 

H  -  4  -  4 ...  -  -<-;)p(x). 

(10) 

ler.  p; 

Simplification  of  equation  (10)  gives 

E  =  (11) 

Vi 


Basically,  for  the  encoder  to  have  simultaneous  global  and  stage-wise  opti¬ 
mality  at  any  given  stage  k,  the  sum  of  the  codevectors  of  stages  k  -|- 1 . . .  K 
must  equal  zero.  This  suggests  that  a  successive-refinement  residual  VQ  is 
not  optimal.  A  rigorous  treatment  of  successive  approximation  is  given  by 
Equitz  and  Cover  [8]. 
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4.6  Design  of  Residual  Vector  Quantizers 

The  design  of  residual  VQs  is  based  on  a  number  of  trade-offs,  and  differ¬ 
ent  training  methods  are  used  for  different  coding  schemes  [9].  Sequential 
search  quantizer  codebooks  in  general  are  different  from  exhaustive  search 
quantizer  codebooks.  A  number  of  design  methods  have  been  proposed. 
Gupta  et  al  [10]  have  proposed  a  joint  codebook  design  in  which  all  but  one 
stage  is  fixed,  and  one  particular  stage  codebook  is  adapted  to  minimize 
overall  distortion.  During  the  next  step  of  the  iteration,  another  codebook 
is  adapted;  this  step  is  repeated  cyclically  until  the  required  convergence  is 
obtained.  Barnes  and  Frost  [7]  use  a  similar  algorithm  for  codebook  design. 
Rizvi  and  Nasrabadi  [11]  have  proposed  a  design  algorithm  based  on  the 
Kohonen  network,  where  an  energy  is  iteratively  minimized  to  reduce  the 
overall  distortion. 

4.7  Residual  Vector  Quantization  with  Variable  Block  Size 

The  residual  VQ  outlined  thus  far  quantizes  blocks  (vectors)  of  the  same 
size  at  each  stage.  When  an  input  sequence  to  be  quantized  contains  non¬ 
stationary  artifacts,  it  is  often  difficult  to  compress  with  fixed-block-size 
quantizers.  Blocks  containing  discontinuities  are  quantized  rather  poorly 
by  all  the  stages,  or  they  require  a  large  number  of  residual  VQ  stages.  One 
way  to  solve  the  problem  is  to  use  smaller  block  sizes  at  the  later  stages  of 
the  residual  VQ.  A  variable-block-size  residual  VQ  is  shown  in  figure  11.  In 
this  figure,  the  first-stage  quantizer  uses  blocks  of  size  4,  the  second-stage 
quantizer  uses  blocks  of  size  2,  and  the  third-stage  quantizer  uses  blocks  of 
size  1  (the  third-stage  quantizer  is  a  scalar  quantizer  in  this  example). 

4.8  Pruned  Variable-Block-Size  Residual  Vector  Quantizer 

Often  not  every  part  of  a  digital  signal  needs  to  be  quantized  by  all  the 
stages  of  the  residual  VQ.  Sections  of  the  signal  containing  little  or  no  in¬ 
formation  can  easily  be  represented  by  just  one  or  two  stages.  Restricting 
the  number  of  stages  for  a  particular  section  of  the  signal  is  equivalent  to 
pruning  the  tree  structure,  as  shown  in  figure  11.  There  are  a  number  of  dif¬ 
ferent  ways  of  pruning  the  tree.  One  significant  characteristic  of  a  pruned 


Figure  11.  Tree  structure  of  variable-block-size  residual  VQ.  Pruning  of  right  tree  corre¬ 
sponds  to  variable-rate /variable-block-size  residual  VQ. 


19 


variable-block-size  residual  VQ  is  that  very-low-bit-rate  side  information 
needs  to  be  stored/transmitted.  This  side  information  determines  the  par¬ 
ticular  tree  structure  used  for  every  block.  The  tree  structure  is  required  by 
the  decoder,  as  it  needs  to  know  the  number  of  stages  used  for  every  part 
of  the  sequence. 

4.8.1  Top-Down  Pruning  Using  a  Predefined  Threshold 

This  algorithm  can  decide  the  number  of  stages  to  be  used  for  a  particu¬ 
lar  block  by  examining  the  error  after  every  stage.  If  the  error  at  the  end 
of  a  particular  stage  is  less  than  a  threshold,  then  the  quantization  can  be 
stopped  at  that  stage.  The  error  is  measured  by  a  significance  measure,  such 
as  the  L2  norm  (mean  squared  error).  If  the  L2  distance  after  stage  k  is  less 
than  a  threshold  r^,  then  quantization  is  stopped  at  that  stage.  Choice  of 
the  threshold  determines  the  performance  of  this  algorithm.  When  a  single 
global  threshold  t  =  ti  =  ...  =  Tk  =  ■■■  =  tk  is  used,  it  directly  controls 
the  output  bit  rate  of  the  quantizer. 

4.8.2  Optimal  Pruning  in  the  Rate-Distortion  Sense 

Optimal  pruning  in  the  rate-distortion  sense  is  a  bottom-up  pruning  tech¬ 
nique  in  which  a  given  block  is  quantized  by  all  the  stages.  The  quanti¬ 
zation  error  is  measured  after  every  stage  and  stored.  For  tree  pruning, 
the  number  of  bits  required  to  encode  to  a  particular  depth  is  traded  off 
against  the  distortion.  Let  T  represent  a  set  of  all  possible  tree  structures 
and  Qr(X)  be  the  quantized  value  of  X  corresponding  to  the  tree  structure 
T.  Let  L{T)  represent  the  number  of  bits  required  to  represent  a  particular 
tree  structure.  For  a  particular  tree  structure  T,  let  It  represent  the  set  of 
indices  after  quantization  (based  on  the  tree  structure)  and  L{It)  represent 
the  number  of  bits  required  to  encode  the  indices.  A  Lagrangian  formula¬ 
tion  can  be  made,  and  the  tree  can  be  pruned  according  to  this  objective 
function.  This  technique  is  equivalent  to  finding  a  particular  tree  structure 
that  minimizes  the  following  cost  function: 

d(X,Q(X))  +  A-(L(Jr)+L(r)).  (12) 

The  value  of  the  Lagrangian  multiplier  A  controls  the  output  bit  rate  of 
the  quantizer.  It  controls  the  slope  of  the  tangent  to  the  R-D  curve  of  the 
quantizer  at  different  operating  points,  as  shown  in  figure  12. 

4.9  Transform-Domain  Vector  Quantization  for  Large  Blocks 

Direct  vector  quantization  of  large  blocks  is  computationally  expensive, 
and  the  design  of  the  codebooks  for  large-block  VQs  is  difficult.  The  com¬ 
plexity  of  VQs  can  be  reduced  through  transform  vector  quantization  [12]. 


20 


Distortion 


Figure  12.  Optimal  pruning  of  residual  VQ  in  rate-distortion  sense. 

The  data  vector  is  first  transformed  by  a  decorrelating  transformation 
such  as  the  discrete  cosine  transform  (DCT).  A  masking  function  M  is  then 
applied  to  the  transformed  data  to  reduce  the  dimensionality  of  the  vec¬ 
tor.  In  its  simplest  form,  the  masking  function  is  a  binary  vector,  and  it 
truncates  the  number  of  coefficients  used.  The  masking  function  can  also 
contain  a  normalizing  factor  for  each  coefficient  based  on  its  variance.  The 
resulting  vector  is  then  quantized  by  a  small-vector-dimension  quantizer. 
For  decoding,  the  inverse  of  the  mask  function  M~^  is  first  applied  (usually 
in  the  form  of  padding  with  zeros),  followed  by  the  inverse  transform  <1>“^. 
Use  of  a  unitary  transform  like  the  DCT,  which  compacts  the  signal  energy 
to  a  relatively  small  number  of  coefficients,  leads  to  the  requirement  of  a 
VQ  with  much  smaller  dimensions.  It  is  therefore  possible  to  use  transform 
VQs  in  the  initial  stages  of  a  variable-block-size  residual  VQ. 

4.10  Video  Compression  Using  Residual  Vector  Quantization 

4.10.1  Theory  of  Residual  Vector  Quantization 

The  residual  signal  generated  by  the  motion  compensation  algorithm,  as 
described  in  section  2.3,  can  be  compressed  by  a  residual  VQ.  This  sec¬ 
tion  describes  a  particular  implementation  of  an  encoder  using  residual 
vector  quantization.  In  the  first  two  stages  of  the  encoder,  quantization  is 
performed  in  the  transform  domain,  as  shown  in  figure  13.  The  residual 
signal  r°(z,  j,  t)  is  broken  into  blocks  of  dimension  mi  x  rii,  represented  by 
J,  t).  The  algorithm  measures  variances  of  each  block  and  compares 
them  to  a  threshold,  to  determine  if  the  block  needs  to  be  encoded.  Often, 
the  background  areas  contain  no  information,  since  the  motion  estimation 
algorithm  predicts  the  data  perfectly  The  vectors  R°(/,  J,  t),  which  require 
transmission,  are  then  transformed  to  produce  the  transform-domain  sig¬ 
nal  J,  t),  through  a  transform  operator 

7^°(/,J,^)  =  4>[RO(/,J,^)].  (13) 
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Motion  vectors 


Figure  13.  Video  encoder  based  on  residual  vector  quantization. 

A  masking  operator  Mi  is  then  applied  to  the  transformed  vector  to  pro¬ 
duce  a  truncated  vector  of  reduced  dimension,  This  vector 

is  quantized  by  the  first-stage  VQ.  If  c^i  is  the  best  matching 
code  vector,  the  approximation  of  R°(/,  J,  t)  is  given  by 

Qi[R°(I,  J,t)]  =  (14) 

where  and  are  the  inverses  of  the  masking  function  and  the  trans¬ 
form  operator.  The  error  after  the  first-stage  quantization  is  given  by 

Ri (I,  J,  t)  =  R°(/,  J,  t)  -  Qi  [R°(/,  J,  t)] .  (15) 

This  residual  vector  is  measured  for  significance,  and  if  it  requires  further 
compression,  a  second  mask  M2  is  applied  to  the  transformed  vector  to 
produce  vector  J,t).  This  vector  is  quantized  by  the  second-stage 

VQ.  If  the  best  match  codevector  is  c‘^2,  the  approximation  of  R°(/,  J,  f) 
after  the  second  stage  is  given  by 

Q2[RH4,d,t)]  =  4>-i[Mfi(ciO  +  M2-H4)].  (16) 

The  masking  functions  Mi  and  M2  are  binary  templates,  which  together  se¬ 
lect  the  first  few  perceptually  significant  coefficients.  The  masking  function 
effectively  creates  a  low-pass-filtered  version  of  the  signal  by  discarding  the 
higher  frequency  coefficients.  The  residue  after  the  second  stage  is  given  by 

R2(/,  j,  t)  =  R°(/,  J,  t)  -  Q2[R\I,  j,  t)].  (17) 
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This  residual  vector  is  then  split  into  smaller  blocks  R2(-^^  0  of  dimen¬ 

sion  m2  X  712  for  quantization  by  the  subsequent  stages.  The  algorithm  com¬ 
pares  the  vectors  R2(d^  J' ,  t)  with  a  threshold  to  determine  if  they  are  sig¬ 
nificant  enough  to  require  transmission.  The  significant  blocks  that  require 
transmission  are  quantized  by  the  second-stage  VQ,  which  gives  an  ap¬ 
proximation  Q3[R^(l2,  J2,  f)].  The  process  of  decomposition  into  smaller 
blocks  and  selective  quantization  is  applied  recursively  in  the  later  stages, 
so  that  good  representation  is  obtained  of  the  residual  signal  ro(i,  j,  t).  The 
indices  of  the  quantizers  are  entropy  coded  by  adaptive  arithmetic  coding 
[13]. 

The  block  diagram  in  figure  14  shows  that  operation  of  the  decoder  is  not 
complex.  The  variable-length  decoder  recovers  the  bit  maps  and  the  code¬ 
vector  indices  from  the  arithmetically  encoded  sequence.  The  decoders 
are  constructed  with  lookup  tables.  Based  on  complexity  requirements, 
the  lookup  tables  of  the  first  two  stages  can  store  either  the  transform- 
domain  coefficients  and  c^2  or  the  reconstruction  vectors 
and  4>“^[M2”^(c^2)]-  In  the  first  case,  two  additional  operations — a  padding 
operation  (M“^)  and  an  inverse  transform  operation  — must  be  per¬ 

formed.  In  the  second  case,  memory  requirements  are  significantly  larger.  If 
the  later-stage  vectors  are  required,  a  direct  table  lookup  is  performed,  us¬ 
ing  the  indices,  and  the  low-dimension  vector  is  added  to  the  first-stage  re¬ 
constructed  vector  in  the  appropriate  position.  This  process  provides  the  re¬ 
constructed  residual  signal  R‘’(/,  J,  t),  which  is  then  passed  to  the  motion- 
compensation  stage  to  produce  the  reconstructed  frame. 

In  order  to  achieve  a  true  variable  rate,  the  encoder  makes  decisions  about 
the  number  of  quantization  stages  required  by  each  stage.  These  decisions 
have  to  be  transmitted  to  the  decoder  for  the  encoded  data  to  be  decoded 
correctly.  The  decisions  are  usually  encoded  as  bit  maps  for  each  stage. 
For  a  three-stage  encoder,  three  bit  maps  are  required  for  proper  decod¬ 
ing.  These  bit  maps  could  require  a  significant  portion  of  the  bit  budget  if 


Figure  14.  Video  decoder  based  on  residual  vector  quantization. 
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they  are  not  intelligently  encoded.  The  bit  map  at  the  first  stage  is  combined 
with  the  motion  vector  information  so  that  the  number  of  bits  is  reduced.  If 
the  motion  vector  of  a  block  is  nonzero,  it  is  assumed  that  the  block  needs 
to  be  encoded  by  at  least  the  first  stage;  thus,  the  overhead  first-stage  flag 
bit  for  blocks  with  nonzero  motion  vectors  is  eliminated.  The  bit  budget  for 
these  bit  maps  can  be  further  reduced  by  the  use  of  correlation  between  the 
second-  and  third-stage  bit  maps.  Implementation  details  of  an  RVQ-based 
video  codec  are  given  by  Kwon  et  al  [14]. 

4.10.2  Performance  of  an  RVQ-Based  Video  Codec 

Simulation  results  are  given  for  an  RVQ-based  video  encoder  with  the  fol¬ 
lowing  parameters:  the  first-stage  scalar  quantizer  (SQ)  had  8  quantization 
levels,  the  codebook  size  of  the  second-stage  VQ  was  16,  and  the  codebook 
size  of  the  third-stage  VQ  was  128.  The  number  of  significant  DCT  coef¬ 
ficients  used  was  9.  Simulation  results  are  given  for  the  encoder/ decoder 
operating  at  three  different  very  low  bit  rates.  The  results  are  compared 
with  those  obtained  with  the  H.263  codec  [15,16].  For  the  H.263  codec,  all 
negotiable  options  were  turned  on  (except  for  "PB  frames"),  so  that  we 
could  make  a  reasonable  comparison  with  the  RVQ  codec  (the  PB  frame 
mode  can  be  easily  incorporated  in  the  RVQ  codec).  In  the  H.263  codec, 
the  quantization  parameter  (QP)  for  the  first  frame  was  set  to  its  maximum 
value,  so  that  the  codec  would  give  the  best  performance  for  the  "intra¬ 
frame."  Since  we  were  interested  in  the  steady-state  characteristics  and  not 
the  first  few  transient  frames,  the  bits  consumed  in  the  first  frame  were  not 
included  in  the  bit-rate  calculations.  The  sequences  used  were  those  of  the 
popular  "salesman"  test  sequence.  Each  of  these  sequences  has  8-bit  pix¬ 
els,  with  frame  size  144x176,  and  the  frame  rate  was  10  frames  per  second. 
We  evaluated  performance  by  using  the  peak  signal-to-noise  ratio  (PSNR) 
measurement  to  compare  the  two  coding  methods.  The  computation  re¬ 
quirements  for  the  RVQ  codec  include  real-time  DCT;  these  requirements 
are  the  same  as  those  of  H.263.  The  transform-domain  VQ  has  a  codebook 
of  size  16;  therefore,  its  computational  complexity  is  small.  Only  about  10 
percent  (average  over  all  the  sequences  tested)  of  the  blocks  were  encoded 
by  the  last  VQ  stage. 

Forty  motion-compensated  difference  frames  extracted  from  four  different 
sequences  (in  which  the  test  sequences  were  not  included)  were  used  to 
train  the  codebooks.  The  codebooks  were  trained  in  a  three-step  process. 
We  first  designed  the  scalar  quantizer  using  the  Lloyd's  algorithm.  We  then 
generated  the  second-  and  third-stage  initial  codebooks  using  the  /c-means 
algorithm.  Finally,  we  retrained  the  three  quantizers  using  the  entropy  con¬ 
straint  in  a  closed-loop  manner  to  improve  the  rate-distortion  performance. 
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The  rate-distortion  performance  of  the  two  codecs  is  shown  in  figure  15(a), 
which  demonstrates  that  the  RVQ  codec  outperformed  H.263  at  all  three 
bit  rates.  At  very  low  bit  rates,  the  PSNR  for  H.263  decreases  drastically, 
while  that  of  the  RVQ  codec  decreases  gradually.  The  performance  on  the 
H.263  codec  for  the  salesman  sequence  decreases  dramatically  at  very  low 
bit  rates,  because  of  the  rather  large  motion  in  this  sequence.  In  contrast, 
the  RVQ  codec  handles  such  sequences  very  well  at  very  low  bit  rates.  Fig¬ 
ures  15(b),  (c),  and  (d)  show  the  PSNR  results  of  each  reconstructed  frame 
at  three  different  bit  rates.  Since  no  rate  control  is  used,  a  variable-bit-rate 
(VBR)  bit  stream  with  constant  quality  is  generated.  Figures  16(b),  17(b), 
and  18(b)  show  the  reconstructed  48th  frames  for  the  salesman  sequence 
compressed  at  the  three  different  bit  rates  by  the  RVQ  encoder.  Figures 
16(d),  17(d),  and  18(d)  show  the  reconstructed  frames  compressed  by  the 
H.263  encoder  at  the  same  bit  rates.  It  can  be  clearly  seen  (especially  at  5.4 
kb/s)  that  H.263  suffers  from  blocking  and  smoothing,  while  the  output  of 
the  RVQ  codec  is  of  much  better  visual  quality. 
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Figure  15.  Performance  of  video  compression  algorithms  using  vector  quantization:  (a)  rate-distortion  performance, 
(b)  PSNR  results  at  12  kb/s,  (c)  PSNR  results  at  8.1  kb/s,  and  (d)  PSNR  results  at  5.3  kb/ s. 
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Figure  16.  Results  for  bit  rate  of  approximately  12  kb/s:  (a)  frame  bit  rate  for  sequence,  (b)  RVQ  codec,  (c)  quadtree-VQ 
codec,  and  (d)  H.263  codec. 
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Figure  17.  Results  for  bit  rate  of  approximately  8.1  kb/s;  (a)  frame  bit  rate  for  sequence,  (b)  RVQ  codec,  (c)  quadtree-VQ 
codec,  and  (d)  H.263  codec. 
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Figure  18.  Results  for  bit  rate  of  approximately  5.3  kb/s; 
codec,  and  (d)  H.263  codec. 


frame  bit  rate  for  sequence,  (b)  RVQ  codec,  (c)  quadtree-VQ 


5.  Quadtree-Based  Vector  Quantization 


A  quadtree  is  a  hierarchical  data  structure  used  to  represent  regions, 
curves,  surfaces,  and  volumes.  Representations  of  regions  by  a  quadtree 
are  achieved  by  the  successive  subdivision  of  the  image  array  into  four 
equal  quadrants.  This  process  is  known  as  a  regular  decomposition  of  an 
array.  An  image  is  thus  decomposed  into  homogeneous  regions  with  sides 
of  lengths  that  are  powers  of  two.  A  tree  of  degree  4  (each  nonleaf  has 
four  children)  is  generated  to  represent  the  image  in  terms  of  its  homo¬ 
geneous  regions.  The  root  node  corresponds  to  the  entire  array,  and  each 
child  of  a  node  represents  a  quadrant  of  the  region  represented  by  that 
node.  Leaf  nodes  of  the  tree  correspond  to  those  blocks  for  which  no  further 
subdivision  is  necessary.  The  above  segmentation  procedure  is  known  as  a 
top-down  construction  of  the  quadtree.  Another  possibility  for  construct¬ 
ing  a  quadtree  is  a  bottom-up  procedure,  where  small  blocks  are  merged 
together  recursively  to  form  a  larger  block  if  they  are  homogeneous  with 
respect  to  the  merging  criterion. 

The  regular  decomposition  method  does  not  necessarily  correspond  to  the 
segmentation  of  the  image  into  maximal  homogeneous  regions.  It  is  likely 
that  unions  of  adjacent  blocks  form  homogeneous  regions.  To  obtain  these 
maximal  homogeneous  regions,  we  must  allow  the  merging  of  adjacent 
blocks.  However,  the  resulting  partition  will  no  longer  be  represented  by 
a  quadtree;  instead,  the  final  representation  is  in  the  form  of  an  adjacency 
graph.  (An  alternative  method  to  obtain  maximal  homogeneous  regions  is 
to  use  a  decomposition  technique  that  is  not  regular:  that  is,  it  segments 
the  image  into  rectangular  blocks  of  arbitrary  size.  Such  a  method  would 
require  a  different  coding  procedure  for  each  block  size.)  Here,  we  use  a 
regular  decomposition  method  because  the  resulting  blocks  are  square;  this 
method  reduces  the  complexity  of  the  encoder,  the  decoder,  and  the  num¬ 
ber  of  bits  required  to  represent  the  binary  quadtree.  The  homogeneous 
regions  so  obtained  are  thus  not  necessarily  maximal. 

5.1  Quadtree  Decomposition 

A  quadtree  decomposition  results  in  an  unbalanced  tree  structure  with  leaf 
nodes  of  different  sizes.  In  a  regular  decomposition,  the  leaf  nodes  are  re¬ 
stricted  to  square  blocks.  It  is  further  possible  to  restrict  the  sides  of  the 
leaf  nodes  to  a  small  range  of  values.  Such  a  restriction  results  in  a  tree 
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structure  with  the  leaf  nodes  being  square  blocks  with  a  maximum  of  n  dif¬ 
ferent  sizes.  A  signal  decomposed  by  the  above  method  can  be  compressed 
by  vector  quantization  of  the  leaf  blocks.  This  will  require  n  different  VQs 
corresponding  to  the  different  block  sizes.  All  leaf  nodes  of  the  same  size 
are  quantized  by  one  VQ,  as  shown  in  figure  19. 

The  choice  of  criterion  used  in  the  quadtree  decomposition  is  one  of  the 
most  important  factors  in  the  design  of  a  quadtree-based  VQ. 


Figure  19.  Vector  quantization  of  quadtree  leaf  nodes. 


5.2  Optimal  Quadtree  in  the  Rate-Distortion  Sense 

Quadtree  decomposition  for  vector  quantization  can  take  into  account  the 
distortion  introduced  by  the  VQ.  An  optimal  decomposition  algorithm,  in 
the  rate-distortion  sense,  was  introduced  by  Sullivan  and  Baker  [17].  The 
algorithm  attempts  to  minimize  a  constrained  error  function  defined  as 
follows: 

T‘'  =  d(X,Q''(X))  +  A-b5,  (18) 

where  d  is  the  distortion  introduced  in  quantizing  the  block  k  with  a  par¬ 
ticular  tree  structure  q,  and  b*?  is  the  total  number  of  bits  used  to  represent 
the  tree  and  the  quantization  indices.  The  Lagrangian  A  controls  the  trade¬ 
off  between  the  bit  rate  and  distortion;  it  determines  the  operating  point  on 
the  R-D  curve,  as  explained  in  section  4.8.2. 

Consider  a  block  X'”  of  size  2”^  x  2™  and  its  descendents  X™“^,  i  =  1 ...  4. 
Let  the  distortion  of  quantizing  the  blocks  X["“^  with  the  optimal  quantizer 
be  d™“^,  and  the  number  of  bits  necessary  to  optimally  quantize  X["“^  be 
b]”“^.  Similarly,  let  the  distortion  of  quantizing  the  block  X™  be  d'”,  and 
the  number  of  bits  be  b"^.  The  four  blocks  X™“^  are  merged  and  coded  as 
a  single  block  of  size  2"^  x  2”^  if  the  following  condition  is  true: 

d™  +  Ab”^  <  ^d™-i +  A^b["-^  (19) 

i  i 

The  above  criterion  can  be  used  to  prune  the  tree  in  a  bottom-up  manner  to 
obtain  the  R-D  optimized  hierarchical  quantization  scheme. 
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5.3  Video  Compression  Using  Quadtree-Based  Vector  Quantization 

The  residual  signal  generated  by  the  motion-compensation  algorithm  can 
be  compressed  by  the  quadtree  VQ  [18].  The  residual  signal  j,  t)  is  di¬ 
vided  into  blocks  of  size  2™  x  2™.  Each  of  these  blocks  is  encoded  by  the  R- 
D  optimized,  quadtree-based  VQs.  A  k-stage  hierarchical  VQ  uses  k  VQs, 
which  work  blocks  of  size  2^  x  2"',  n  =  m, ...  —  k.  The  quadtree  bitmap 

is  encoded  as  shown  in  figure  20. 

We  have  implemented  a  video  compression  system  using  the  quadtree  VQ 
that  we  describe  here.  A  simulation  framework  similar  to  that  used  in  the 
RVQ  video  codec  was  used  in  evaluating  the  performance  of  the  quadtree- 
VQ-based  video  compression  algorithm.  The  residual  signal  after  motion 
compensation  was  compressed  by  a  three-stage  quadtree  VQ.  The  three 
quantizers  used  blocks  of  size  16x16,  8x8,  and  4x4,  respectively.  Results 
are  given  for  an  encoder  that  uses  scalar  quantizers  for  blocks  of  size  16  x  16 
and  8x8.  Blocks  of  4x4  are  compressed  by  a  VQ  trained  by  the  GLA  algo¬ 
rithm.  The  quadtree  was  segmented  with  the  R-D  optimized  algorithm.  We 
tested  the  performance  using  the  "salesman"  sequence  at  three  very  low  bit 
rates,  as  we  did  for  the  motion-compensated  RVQ  video  codec.  The  perfor¬ 
mance  of  the  quadtree-VQ  compression  algorithm  was  numerically  similar 
to  that  of  the  RVQ  compression  algorithm.  Figure  15  shows  the  PSNR  re¬ 
sults  of  each  reconstructed  frame  at  three  different  bit  rates.  Figures  16c, 
17c,  and  18c  show  the  reconstructed  48th  frames  for  the  salesman  sequence 
compressed  at  the  three  different  bit  rates  by  the  quadtree-VQ  encoder.  As 
these  figures  show,  the  performance  of  the  quadtree-VQ  encoder  is  similar 
to  that  of  the  RVQ  encoder  at  all  the  bit  rates. 


1  2 


Code:  1  (  1  (001 0)  0  1  (1 000)  1  (1 000) ) 


Figure  20.  Encoding  quadtree  data  structure. 
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6.  Conclusions 


In  this  report,  we  have  explored  two  vector-quantization-based  video  com¬ 
pression  algorithms.  In  our  work  we  have  identified  two  important  areas 
that  can  be  exploited  to  improve  upon  existing  coding  methods: 

1.  Multiscale  segmentation:  different  areas  in  the  image  are  coded  at  dif¬ 
ferent  scales.  In  vector  quantization,  this  is  equivalent  to  using  differ¬ 
ent  block  sizes  for  different  areas. 

2.  Multirate  coding:  different  areas  in  the  image  are  coded  at  differ¬ 
ent  precisions,  since  all  areas  of  the  image  do  not  contain  the  same 
amount  of  information. 

We  have  used  two  different  methods  to  achieve  the  above  goals.  In  the 
RVQ-based  encoder,  we  use  the  successive-refinement  paradigm  to  achieve 
a  variable  rate.  In  the  quadtree-VQ  encoder,  the  rate  variability  is  limited, 
but  this  technique  is  superior  to  a  successive-refinement  technique  because 
it  performs  direct  quantization.  Both  algorithms  use  variable  block  sizes. 
The  resulting  performance  of  these  two  encoders  is  similar,  and  both  types 
are  superior  to  existing  video  compression  standards. 
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